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PRECIS 

RESEARCH  PROGRESS  REPORT 


Title:  "Annual  Progress  Report:  Automatic  Indexing  and  Abstracting,  Part  II.  English 
Indexing  of  Russian  Technical  Text,"  H.  R.  Robison,  Annual  Progress  Report,  Part  II, 
Office  of  Naval  Research,  Contract  Nonr  4440(00) 

Background:  This  investigation  is  concerned  with  the  development  of  automatic  indexing, 
abstracting,  and  extracting  systems .  Basic  investigations  in  English  morphology, 
phonetics,  and  syntax  are  pursued  as  necessary  means  to  this  end. 

Condensed  Report  Contents:  The  following  report  describes  a  computer  system  for 
the  IBM  7094  which  produces  English  indexes  of  technical  Russian  text. 

Part  of  the  indexing  system  produces  a  machine  dictionary  on  magnetic  tape.  This 
dictionary  is  a  computer  representation  of  standard  English -Russian  phrase  technical 
dictionaries .  A  machine  dictionary  must  exist  for  the  same  field  as  the  text  being 
indexed. 

The  indexing  portion  of  the  system  operates  upon  the  machine  dictionary.  Russian 
text  phrases  are  matched  against  Russian  dictionary  phrases.  When  a  match  is  found, 
the  English  translation  is  extracted  from  the  dictionary.  The  fir»>  index  is  constructed 
from  the  set  of  such  English  translations. 

The  Russian  dictionary  entries  are  in  canonical  form.  The  indexing  system  contains 
reverse  inflection  algorithms  which  transform  text  phrases  to  their  canonical  forms. 

For  Further  Information:  The  complete  report  is  available  in  the  major  Navy  technical 
libraries  and  can  be  obtained  from  the  Defense  Documentation  Center.  A  few  copies 
are  available  for  distribution  by  the  author. 
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FOREWORD 


This  report  is  Part  II  of  the  Annual  Progress  Report;  Automatic 
Indexing  and  Abstracting  submitted  to  the  Office  of  Naval  Research 
under  Contract  Nonr  4440(00).  The  work  was  jointly  supported  by 
the  Independent  Research  Program  of  Lockheed  Missiles  &  Space 
Company,  and  the  Office  of  Naval  Research. 
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ABSTRACT 


The  following  report  describes  a  computer  system  for  the  IBM  7094 
which  produces  English  indexes  of  technical  Russian  text. 

Part  of  the  indexing  system  produces  a  machine  dictionary  on  magnetic 
tape.  This  dictionary  is  a  computer  representation  of  standard  English- 
Russian  technical  phrase  dictionaries. 

The  indexing  portion  of  the  system  matches  Russian  text  phrases  against 
Russian  dictionary  phrases.  Dictionary  phrases  are  in  canonical  form; 
reverse  inflection  algorithms  transform  text  phrases  to  their  canonical 
form.  When  a  match  is  found,  the  English  translation  of  the  match  is 
extracted  from  the  dictionary.  The  final  index  is  constructed  from  the 
set  of  such  English  translations. 
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Section  1 
INTRODUCTION 


1. 1  GENERAL  DISCUSSION 

This  report  describes  a  computer  system  for  the  IBM  7094  which  produces  a  deep 
English  index  of  untranslated  scientific  Russian  text.  The  index  is  printed  m  a  back- 
of-tht -book -type  format.  Though  verbs  may  appear  in  a  human -produced  index  of  this 
type,  such  an  index  consists,  for  the  most  part,  of  an  alphabetically  arranged  collection 
of  nouns  and  their  modifiers.  The  computer-produced  index  described  here  indexes 
nouns  and  their  modifiers  with  great  accuracy.  If  desired,  the  system  will  also  per¬ 
form  cross  indexing. 

In  the  analysis  and  programming  of  the  indexing  system .  the  most  attention  was  devoted 
to  nouns  and  adjectives.  However,  the  system  will  index  verbs  if  desired.  Only  a 
certain  noun  configuration  will  not.  at  present,  be  indexed.  This  will  be  discussed  in 
the  body  of  the  report. 

This  report  has  been  written  for  readers  having  little,  if  any,  knowledge  of  the  Russian 
language.  This  fact  has  led  to  a  somewhat  lengthy  report  as  Russian  examples  have 
been  used  widely. 

Section  J  discusses  the  inflectional  nature  of  Russian,  and  the  concepts  of  paradigm 
and  reverse  induction.  Those  familiar  with  Russian,  even  on  general  terms  can 
ignore  this  section. 

Section  3  extends  the  meanings  of  paradigm  and  reverse  inflection  to  phrases. 

Sections  4  and  5  describe  the  construction  of  the  machine  dictionary  anil  the  Raw  Index, 
respectively. 
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Section  6  describes  the  rules  whereby  the  Final  Index  is  formed  from  the  Raw  Index. 

Section  7  describes  possible  uses  to  which  the  Indexing  System  may  be  put. 

Appendix  D  uses  a  formal,  compact  notation  to  describe  the  formation  of  the  Dictionary 
Creation  Program  and  the  Raw  Index.  The  verbal  description  of  the  Indexer,  which 
occupies  most  of  the  report,  can  be  read  independently  of  this  section.  On  the  other 
hand,  an  understanding  of  this  section  will  in  itself  give  a  complete  understanding  of 
the  formation  of  the  Raw  Index. 

1.2  TRANSLATION  OF  PHRASES 

It  should  be  stressed  that  this  system  produces  English  indexes  of  untranslated  Russian 
texts.  It  does  so  by  recourse  to  a  computer -implemented  technical  phrase  dictionary. 
Such  dictionaries,  both  Russian -English  and  Engllsh-Russian.  are  commercially 
available  for  a  wide  variety  of  technical  fields.  Part  of  the  indexing  system  will  con¬ 
vert  any  such  dictionary  to  a  machine  dictionary  on  magnetic  tape.  A  machine 
dictionary  must  be  available  for  the  same  technical  field  as  the  corpus  being  indexed . 

The  necessity  for  using  such  dictionaries  can  be  stated  briefly:  the  translation  of  a 
technical  phrase  is  a  function  of  all  the  words  within  it  taken  together.  Only  occasionally 
is  the  translation  of  a  phrase  equal  to  the  translation  of  the  words  composing  it. 

Figure  1-1  lists  five  Russian  phrases.  The  word-for-word  translations  of  the  five 
phrases  are: 

•  Automatic  telephone  system 

•  Cathode  with  open  upper  end 

•  Contact  combs  for  recalculation 

•  Equally  accessible  bunches 

•  Clearance  with  released  anchor 
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1.  ABTOMATHMECKAB  TEJIEGOHHAR  CHCTEMA 

AUTOMATIC  TELEPHONE  SYSTEM 

2.  KATOfl  C  OTKPbITbIM  BEPXHHM  KOHUOM 

OPEN  CATHODE 

3.  KOHTA  KTHbl  E  TPEE  EHKH  HJIR  I1E  PEC  META 

TRANSLATION  FIELD 

4.  PABHOHOCTYriHblE  IIYMKH 

EQUALLY  ACCESSIBLE  TRUNK  GROUPS 

5.  3A30P  riPH  OTnyiUEHHOM  RKOPE 

RELEASED  GAP 

Fig.  1-1  Phrase  Translations 


j  Vf 


Below  each  Russian  phrase  in  Fig.  1-1  appears  the  English  translation  selected  from 
an  electronics  phrase  dictionary.  Only  in  the  first  phrase  are  the  two  translations 
the  same. 

1.3  INDEX  ENTRIES:  AN  EXAMPLE 

Before  proceeding  to  the  body  of  the  report,  a  brief  example  of  the  formation  of  an 
index  will  be  demonstrated.  Figure  1-2  illustrates  a  sentence  in  which  six  single 
words  and  three  phrases  are  contained  in  a  phrase  dictionary  on  nuclear  physics. 

It  frequently  happens  that  a  text  phrase  does  not  have  a  translation  in  the  dictionary, 
while  the  elements  composing  it  do  have  dictionary  translations.  In  such  a  case,  all 
elements  which  are  contained  in  the  dictionary  are  translated  and  the  resulting  com¬ 
bination  forms  an  entry  for  the  index.* 

...KAMEPy  BH  J!  bCOHA  B  MATHHTHOM  riOJlE.... 

CLOUD  CHAMBER  MAGNETIC  FIELD 

_ _ _ I  L. - _J 

CLOUD  CHAMBER  IN  MAGNETIC  FIELD 


Sometimes  elements  of  text  phrases  are  high  frequency  words  such  as  some,  though 
not  most,  prepositions.  The  translations  of  such  words  will  be  incorporated  into  the 
index  where  possible. 


Figure  1-2  shows  the  sentence  to  be  indexed.  The  phrases  occurring  in  the  nuclear 
dictionary  are  underlined. 


The  index  contains  the  following  items. 

•  Cloud  chamber  in  magnetic  field 

•  Detector  of  particles 


U  will  be  necessary  in  the  report  to  discuss  text  phrases  and  dictionary  phrases.  T<> 
avoid  confusion ,le.\t  phrases  wall  appear  with  a  string  of  dots  which  represent  the 
sentence  in  which  the  phrase  appears. 
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Fig.  1-2  Approach  to  Indexing 


•  Energy  loss  after  passage  through  lead  absorbers 

•  Lead  absorbers,  energy  loss  after  passage  through 

•  Magnetic  field,  cloud  chamber  in 

1.4  THE  INDEXING  SYSTEM 

The  indexing  system  consists  of  two  main  parts,  the  Dictionary  Creation  Program 
(DCP),  and  the  Indexer. 

1.4. 1  Dictionary  Creation  Program 

The  DCP  creates  a  machine  dictionary  which  is  used  by  the  Indexer.  Creation  of  the 
machine  dictionary  takes  place  prior  to  and  independently  of  the  Indexer's  operation. 
The  dictionary  itself  is  stored  on  magnetic  tape.  Once  a  dictionary  for  a  given  field 
or  subfield  of  science  has  been  created,  it  is  used  by  the  Indexer  to  index  all  texts  in 
the  same  field. 

1.4.2  The  Indexer 

The  operation  of  the  Indexer  proceeds  in  two  steps:  creation  of  a  Haw  Index,  and 
creation,  from  the  Haw  Index,  of  the  Final  Index. 

•  Raw  Index 

Phrases  in  the  Russian  text  which  is  being  indexed  are  matched  against 
Russian  phrases  in  the  machine  dictionary.  When  a  match  is  found,  the 
English  translation  of  the  Russian  phr:  se  is  retrieved  from  the  dictionary 
and  placed  in  the  Haw  Index,  along  with  information  regarding  the  position 
and  length  id  the  phrase  on  the  jiage.  and  the  part-of-spocch  of  the  mam  word 
in  the  phrase. 

•  Final  Index 

The  Haw  ImIc.x  is  examined  and.  using  the  information  contained  in  it  a  Final 
Index  is  constructed. 
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Section  2 

INFLECTIONAL  NATURE  OF  THE  RUSSIAN  LANGUAGE 

2.1  INFLECTION  DEFINED 

Before  discussing  the  indexing  system  in  detail,  it  is  essential  to  understand  something 
of  the  inflection  problem  in  Russian. 

Inflection  is  that  property  of  a  language  by  which  a  particular  relationship  between  two 
words,  usually  nouns  or  pronouns,  is  expressed  by  a  change  in  form  of  one  of  the 
words.  This  type  of  relationship  is  usually  referred  to  as  case. 


English  was  once  a  more  highly  inflected  language  than  at  present,  but  there  still 
remains  a  residue  of  the  old  case  structure.  Thus,  the  pronoun  I  inflexts  to  me  when 
it  becomes  the  direct  object  of  a  verb  or  the  object  of  a  preposition,  as  who  changes 
to  whom  in  the  same  circumstances. 

In  Russian  there  are  six  cases,  and  two  numbers  -  singular  and  plural.  In  Russian, 
therefore,  a  noun  may  have  as  many  as  twelve  forms.  Figures  2-1  and  2-2  show  the 
case  forms  for  the  three  genders  of  Russian  nouns  -  masculine,  feminine,  and  neuter. 
The  names  of  the  six  cases  are  nominative,  genitive,  dative,  accusative,  instrumental 
and  prepositional. 

Russian  adjectives  also  inflect  according  to  the  case  of  the  noun  they  modify  (Fig.  2-:*». 

Figures  2-1  through  2-2  illustrate  the  declensions  of  regular  nouns  and  adjectives. 

The  set  of  all  inflected  forms  of  a  given  word  is  called  the  paradigm  of  the  wo  ml. 
Generally,  some  of  the  members  of  the  paradigm  coincide.  The  set  which  contains 
only  distinct  entries  is  called  the  reduced  paradigm. 
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Singular 


tla^rvline 

Neuter 

Hard 

Soft 

Soft 

Hard 

Soft 

Soft 

Norn. 

cto  n 

My3Cfl 

AOvKAb 

MeCTO 

no.ie 

3A*inne 

Gen. 

CTO  Jtk 

My3ea 

AojkaA 

MCCTa 

no.ia 

anainta 

Dat. 

cToay 

MV3CK) 

AOJKAIO 

MCCTy 

nO.TK) 

3Aiiiuno 

Aim1 

CTOA 

Mvarfl 

AOXv.lb 

MCClO 

no.ie 

3.inmie 

Inst  v. 

cto.iom 

MV3.CM 

A<>  /I.  ACM 

Ml't  TOM 

no. i  cm 

3a;umeM 

Prep. 

ciojie 

wyacc 

AO>KAe 

MCCTC 

no.ie 

3  A3  mi  1! 

Plural 

Norn. 

CT'i.lW 

MV30H 

AOtKAH 

Mecra 

no.iH 

3,iamifl 

Gen. 

Cto.iOB 

MV3CCB 

AowAefl 

MeCT 

iio.-icA 

3Aamtfi 

Dat. 

CTO.iaM 

MV3CAM 

AOX<A«M 

MCCT3IW 

n  o.i  mm 

3A3MI1HM 

Acc. 

CTO.lM 

M>'3eH 

ao>kah 

Mena 

ro.iM 

3A.THHH 

Instr. 

CTo.naiwH  MyacfiMii 

AO/KABMH 

MCCTaMH 

t  no.iflMH 

3nailllflMII 

Prep. 

noaax 

M)’3eJ»X 

AOWAflX 

|  MCCT3X 

non  Ax 

3Aam<nx 

Fig.  2-1  Masculine  and  Neuter  Genders 
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Hard 

Singular 

Soft 

Sift 

"Soft' 

Noin. 

KOMita  ra 

HCAC.1H 

AHCpb 

t|>a  MM.inn 

Gen 

KOMiUTbl 

IICAC.ltl 

AIh;>m 

M  0.1  i(  It 

Dat. 

KOMItaTC 

He.TC.K 

A  no  pit 

(jt.ni  i  t.  i  it  it 

Acc. 

KONtiiary 

HCAe.no 

Attcpb 

(jt.niit.iaio 

Instr 

kovikitoA  (oki) 

HcAtMCft(CK)) 

AltOJ'bH) 

(j’.ni  i'!.iitcii(c 

Prep. 

KOMItaTC 

HeAC.TC 

ABOjll 

(jt.iM  it.  m  it 

Plural 

Norn 

KOMIIJTW 

hcao.im 

Ant  pH 

iliaMii.inH 

Gen. 

KOMHaT 

HCAC.1t. 

A  net  oil 

cjtaMM.iml 

Dat. 

KOMHOTaM 

HCACAttM 

At"  i-XM 

(|i;im  li.iaHM 

Acc. 

KOMliathl 

HCAC.1H 

A 

(j’aMM.iiin 

Instr. 

KOMiiaraMit 

HCAC.1HMH 

A;  ,‘HMII 

(J'.IM  li.lMttMII 

Prep. 

KOMIUTaX 

HCAC.1HX 

Aik  ;  HX 

(jtaMli.lMHX 

Fig.  2-2  Feminine  Gei  U-r 
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Hard:  -'wA  (-'t«A);  -oA 


Masculine 

Neuter 

Feminine 

Plural 

All  Genders 

Nom. 

h6bwA 

Hdeoe 

HdBan 

HOBMC 

Gen. 

HOBOrO 

n6Boro 

hoboA 

h6bmx 

Dat. 

HOBOMy 

HOBOMy 

HOBOll 

HOBblM 

Acc. 

HOBMA(oro) 

HOBOe 

HOBytO 

H6Bb!C(l>tX) 

Instr. 

HOBblM 

HOBblM 

hoboA  (ok>) 

HOBbtMH 

Prep. 

HO  BOM 

HOBOM 

h6boA 

HOBbtX 

Soft:  -'hA 


Masculine 

Neuter 

Feminine 

Plural 

All  Genders 

Nom. 

chhhA 

CHHCC 

CHtlflH 

CHHite 

Gen. 

CHHCrO 

CHnero 

cmhcA 

CHHttX 

Dat. 

CMHCMy 

CHHCMy 

cniieA 

CHHHM 

Acc. 

cmmA(ero) 

CHtiee 

CHHIOIO 

chiihc(mx) 

Instr. 

CHHHM 

CHHHM 

CHIICfi  (CIO) 

CHIIIIMH 

Prep. 

CHHCM 

CHHCM 

cHiiefl 

CHHHX 

Fig.  2-3  Declension  of  Adjectives 

There  are.  in  addition.  a  great  number  of  irregular  nouns  and  adjectives  whose 
inflections  differ  in  varying  degree  from  those  shown.  We  are  not  concerned  at  present, 
however,  with  detaiis  of  the  inflectional  structure  but  rather  with  its  size  and  nature. 
The  important  fact  shown  in  Figs.  2-1  through  2-3  is  that,  despite  the  coincidence  of 
two  or  more  elements  oi  a  paradigm,  there  remains  an  overpowering  multiplicity  of 
possible  forms. 


(  This  report  is  concerned  mainly  with  adjectives  and  nouns.  Nevertheless,  since  verbs 
will  be  indexed  it  desired  they  will  be  discussed  where  it  seems  j>ertinent  to  do  so. 
Appendix  A  contains  a  listing  of  verbal  forms  for  the  first  and  second  conjugations.  > 


2.2  INFLECTION  AND  REVERSE  INFLECTION 


in  manual  dictionaries  the  paradigm  of  a  word  is  represented  by  a  single  element  ol  < 
paradigm.  This  element  is  called  the  canonical  form  ol  the  paradigm,  in  an  Kngii>it 
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dictionary  cathode  Lb  the  canonical  form  of  the  paradigm  {cathode,  cathodes} . 
Dictionary  makers  assume  the  ability  on  the  user's  part  to  transform  an  inflected  form 
of  the  word  to  its  canonical  form. 

It  goes  without  saying  that  there  exist  in  both  English  and  Russian  many  "irregular" 
words,  so  called  because  the  "regular"  transformations  will  not  suffice  to  transform 
the  canonical  form  into  elements  of  its  paradigm  and  vice-vcrsa.  Nevertheless,  the 
very  existence  of  canonical  dictionaries  implies  that  the  great  majority  of  words  in 
them  are  susceptible  to  "regular"  transformations. 

The  process  of  transforming  a  canonical  form  into  one  or  more  elements  of  its 
paradigm  will  be  called  inflection.  The  inverse  transformation,  deriving  the  canonical 
form  from  an  element  of  the  reduced  paradigm,  will  be  called  reverse  inflection. 

2.3  TYPES  OF  RUSSIAN  MACHINE  DICTIONARIES 

There  are  two  extreme  forms  which  an  automatic  dictionary  may  take: 

•  An  inflected  or  paradigm  dictionary  in  which  each  element  of  each 
paradigm  is  represented  by  a  distinct  entry 

•  A  canonical  dictionary  in  which  a  paradigm  is  represented  by  a  single 
entry  -  its  canonical  form 

The  paradigm  dictionary  o|>e  rates  with  a  simple  table -look -up  program,  but  it  has.  of 
course  the  disadvantage  of  great  size. 


A  canonical  dictionary  implies  the  existence  of  an  algorithm  of  some  complexity 
which  can  perform  the  reverse  inflection  transformation  on  words  encountered  m  text. 
Hybrids  of  these  two  extremes  are.  of  course,  possible. 
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2.4  THE  REVERSE  INFLECTION  ALGORITHM 


Several  years  ago  a  reverse  inflection  algorithm  was  developed  and  programmed  for 
use  in  a  Russian  parsing  program  being  developed  at  Lockheed.  The  algorithm  itself 
is  the  subject  of  a  forthcoming  report,  but  it  is  discussed  in  general  terms  in 
Appendix  B. 

The  dictionary  which  is  used  in  conjunction  with  the  reverse  inflection  algorithm  is  a 
dictionary  of  canonical  forms;  the  canonical  lorms  are  the  classically  accepted  ones  - 
nominative  singular  for  nouns,  nominative  singular  masculine  gender  for  adjectives, 
and  the  infinitive  for  verbs  and  participles. 

The  principle  of  the  algorithm  can  be  stated  briefly:  potential  canonical  forms  of  a 
given  text  word  are  constructed  by  removing  certain  terminal  strings  of  letters  and 
then  adding  new'  terminal  strings.  After  each  potential  canonical  form  is  constructed, 
an  attempt  is  made  to  find  it  in  the  dictionary.  If  the  potential  canonical  form  has  no 
match  in  the  dictionary,  a  new  poi  ntial  form  is  constructed  and  so  on.  until  a  true 
canonical  form  is  constructed  or  until  all  possible  constructions  for  the  word  in 
question  have  been  exhausted. 

At  the  time,  then  that  the  indexing  program  was  written,  there  existed  a  reverse 
inflection  algorithm  designed  to  operate  on  a  canonical  dictionary  whose  entries  are 
classically  defined  canonical  forms.  The  dictionary  entries  are  single  words,  and 
the  reverse  intlcetion  algorithm  operates  on  a  single  text  word  at  a  time.  Let  us 
define  such  a  dictionary  as  I),  and  the  reverse  inflection  algorithm  as  It.  .  We  will 

b  S 

now  extend  the  definitions  of  this  section  to  a  jihrasc  dictionary. 
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Section  3 

PHRASE  DICTIONARIES,  PHRASES 

3.  1  PHRASE  1'  JTIONARIES 

In  the  IntroducL  it  was  stated  that  the-  Indexer  uses  a  dictionary  of  technical  phrases 
to  compute  the  index.  These  dictionaries  cover  the  terminology  of  a  given  field  or 
subfield  of  science  and  thus  provide  a  list  of  phrase  descriptors  for  the  field  in  question. 

This  is  a  different  type  of  dictionary  than  the  ordinary  dictionary  whose  entries  arc 
single  words.  We  are  dealing  now  with  a  dictionary  whose  entries  are  phrases. 

3.2  COMMERCIALLY  AVAILABLE  PHRASE  DICTIONARIES 

English-Russian  technical  phrase  dictionaries  are  lists  of  English  phrases  and  their 
Russian  translations.  Single  words  may  occur,  but,  in  general,  these  dictionaries 
arc  phrase  dictionaries. 

A  few  dictionaries  of  this  type  have  been  published  in  the  United  States  (where,  of 
course,  the  format  is  usually  Russian-English),  but  for  the  most  part  they  have  been 
compiled  in  the  Soviet  Union.  (References  1  through  6.) 

Most  Soviet  dictionaries,  having  been  prepared  for  translation  of  English  to  Russian, 
are  arranged  alphabetically  according  to  the  English  alphabet. 
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3. 3  EXAMPLES  OF  SOME  TYPICAL  PHRASES 


Tho  following  is  a  list  of  phrases  taken  from  a  nuclear  physics  dictionary 

IlOJIHAfl  MOW  HOC  T  b  PEAKUHH  HA  EAHHHUY  OEbEMA 
TOTAL  REACTION  POWER  DENSITY 

OPEHTAJlbHAB  flAOCKOCTb 
ORBITAL  PLANE 

ootopohcaehhe 

PHOTOPRODUCTION 

3<M>EKTHBHOE  C  t  4  EHH  L  JXJIH  ALJILHHil  Y  PAHA 
CROSS  SECTION  FOR  URANIUM  FISSION 

3««EKT  nEPEHOCA 

TRANSFER  EFFECT 

KOCMM4LCKHH 

COSMIC 

MHAYKTHPOBATb 

INDUCE 

AH04EPEHUMPY  SOWAB  CXEMA 

1)1  F  FERE  NT  IATINCJ  NEi’W  ORK 

HAHOCHTb  B  3ABHCHMOCTH  OT 

PLOT  AGAINST 


:t  -j 


t  OCKHEED  M  SStLFS  &  SPACE  COMPANY 


3.4  PHRASE  PARADIGMS,  CANONICAL  FORMS,  REVERSE  INFLECTION 


The  definitions  of  Section  2  can  easily  be  extended  to  include  phrases.  Consider,  for 
example,  the  first  phrase  in  the  above  set  of  examples.  Its  paradigm  is: 


^Nom. 
I  Gen. 


Singular  < 


Dat. 

Acc. 

Instr. 

Prep. 


IlOJIHAA  MOWHOCTE  PEAKUHH  HA  EAHHHUY  OEbHMA 
nOJIHOfl  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEbEMA 
flOAHOft  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEbEMA 
flOAHYK)  MOW HOCTb  PEAKUHH  HA  EAHHHUY  OEbEMA 
nOJIHOH  MOW  HOCTb  10  PEAKUHH  HA  EAHHHUY  OEbEMA 
IIOAHOft  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEEEMA 


^Nom. 

Gen. 

Dat. 

Plural  < 


Acc. 

Instr. 

JPrep. 


nOJIHUE  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEbEMA 
nOAHUX  MOWHOCTE#  PEAKUHH  HA  EAHHHUY  OEb  E  M  A 
IlOJIHUM  MOWHOCTAM  PEAKUHH  HA  EAHHHUY  OEbEMA 
nOAHUfc  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEEEMA 
nOAHUMH  MOWHOCTAMH  PEAKUHH  HA  EAHHHUY  OLEEMA 
nOJIHUX  MOWHOCTAX  PEAKUHH  HA  EAHHHUY  OEEEMA 


The  reduced  paradigm  is: 


Nom. 


Singular 


Gen. . 
Dat. . 


UOJIHAA  MOWHOCTE  PEAKUHH  HA  EAHHHUY  OEbEMA 

IIOAHOft  MOWHOCTH  PEAKUHH  HA  EAHHHUY  OEEEMA 

nOAHYlO  MOWHOCTE  PEAKUHH  HA  EAHHHUY  OEbEMA 
IIOAHOft  MOWHOCTE  10  PEAKUHH  HA  EAHHHUY  ObbEMA 


Nom. 

Acc. 


nOAHUE  MOWHOCTH  PEAKUHH  HA  E AH HHU Y OEb E M A 


Gen . 


Plural < 

BDat. 


Instr. 


^Prcp. 


nOAHUX  MOWHOCTE#  PEAKUHH  HA  EAHHHUY OEb E M A 
nOAHUM  MOWHOCTAM  PEAKUHH  HA  EAHHHUY  OEbEMA 
nOAHElMH  MOWHOCTAMH  PEAKUHH  HA  EAHHHUY  OEEEMA 
nOAHEIX  MOWHOCTAX  PEAKUHH  HA  EAHHHUY  OE  b  E  VIA 


3-3 


LOCKHEED  MISSILES  &  SPACE  COMPANY 


The  cases  listed  on  the  left  refer  to  the  case  of  the  leftmost  noun.  We  will  define 
the  canonical  form  of  the  phrase  as  the  form  whose  leftmost  noun  is  in  the  nominative 
singular  case.  Thus  in  this  example 

IIOJIHAA  MOUHOCTb  PEAKUHH  HA  EAHHHUY  OEDEMA 


is  the  canonical  form  of  the  phrase  paradigm.  The  process  of  transforming  the 
canonical  form  of  a  phrase  into  one  or  more  elements  of  its  paradigm  is  called  inflection 
and.  as  before,  the  inverse  transformation  deriving  the  canonical  form  of  a  phrase 
from  an  element  of  the  paradigm  is  called  reverse  inflection.  We  will  denote  by  D 


a  dictionary  whose  entries  are  canonical  phrases.  is  the  reverse  inflection 
algorithm  that  transforms  members  of  the  phrase  paradigm  to  canonical  form. 


P 


*  4 
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Section  4 

THE  DICTIONARY  CREATION  PROGRAM 


4. 1  GENERAL  DISCUSSION 

An  examination  of  the  Russian  phrases  of  a  phrase  dictionary  indicates  that  each  phrase 
has  within  it  a  main  or  pivotal  word  upon  which  the  rest  of  the  phrase,  so  to  speak, 
depends.  Usually  this  word  is  the  leftmost  noun  or  verb  of  the  phrase.  (If  the  phrase 
consists  of  a  single  word,  then  obviously  this  is  the  pivotal  word. )  This  pivotal  word 
occurs  in  the  canonical  form -infinitive  of  verbs,  nominative  singular  for  nouns. 

We  will  see  that  this  word  may  automatically  be  identified  with  great  accuracy.  This 
main  or  pivotal  word  will  be  referred  to  from  now  on  as  the  representative  word. 

The  position  of  the  representative  word  within  the  dictionary  phrase  is  defined  as  the 
left  and  right  limits  of  the  phrase  with  respect  to  the  representative  word.  These  left 
and  right  limits  are  called  coordinates. 

The  DCP  selects  a  set  of  representative  words  from  the  phrases  of  the  phrase 
dictionary.  If  a  word  of  this  set  or  an  element  of  the  paradigms  of  this  set  occurs 
in  text,  it  is  a  signal  to  the  Indexer  that  this  text  word  may  be  the  representative  of 
an  entire  text  phrase,  the  equivalent  of  which  is  contained  in  the  machine  dictionary. 

Next,  the  Indexer  retrieves  the  coordinates  of  the  representative  wort!  and  uses  these 
coordinates  to  examine  the  text  environment  of  that  word  which  was  original!)  trans¬ 
formed  into  the  representative  word. 

In  short,  the  construction  of  the  set  of  representative  words  is  equivalent  to  the  estab¬ 
lishment  of  a  set  of  signals  to  inform  the  Indexer  that  the  environment  of  a  given  text 
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word  must  be  examined  in  detail  because  it  may  contain  a  phrase  whose  equivalent 
occurs  in  the  machine  dictionary.  In  addition  to  the  set  of  representative  words,  the 
DCP  creates  a  set  of  Russian  phrases  and  a  set  of  English  translations  of  these  phrases. 
Links  between  these  three  sets  are  also,  of  course,  established  so  that  the  Indexer 
may  thread  its  way  from  representative  word  to  Russian  phrase  represented  by  that 
word  to  English  translation  of  the  phrase. 

4. 2  SELECTION  OF  REPRESENTATIVE  WORDS  AND  THEIR  COORDINATES 

The  dictionary  creation  program  examines  the  string  of  Russian  words  making  up  the 
Russian  portion  of  a  dictionary  entry  and 

•  Selects  the  representative  word  of  the  phrase 

•  Assigns  a  part-of-speech  category  of  noun.  verb,  or  adjective*  to  the 
representative  word 

•  Determines  the  coordinates  of  the  representative  word  within  the  phrase 

The  representative  word  is  defined  as  follows: 

•  The  leftmost  noun  or  verb  in  the  phrase 

•  If  there  is  no  noun  or  verb  in  the  phrase,  the  phrase  is  an  adjective  (or 
a  string  of  adjectives);  the  rightmost  adjective  is  selected  as  Un¬ 
representative  word 

These  rules  imply  an  ability  to  distinguish  between  adjectives,  nouns,  and  verbs. 

Since  the  Dictionary  Creation  program  does  not  have  recourse  to  a  dictionarv,  is  in 


’Participles  in  phrase  dictionaries  generally  liehavc  syntactical!)  as  though  thev  were 
attributive  adjectives  (Appendix  A),  The  canonical  form  of  a  participle  is  generally 
considered  the  infinitive  from  which  the  participle  derives.  We  will  see  thai  adjectives 
which  lie  to  the  left  of  representative  words  or  are  representative  words  themselves 
are  listed  in  the  machine  dictionary  by  their  stems.  This  is  true  ol  participles  as 
well,  the  form  living  the  word  minus  its  adjectival  ending.  If  -C>J  or  *C  b  hail  t<> 
l>e  removed  from  the  participle  to  get  to  its  adjectival  ending  then  *CH  or  •(  1. 
is  restored  to  the  participle  stem.  In  summary,  participles  behave  like  attributive 
adjectives  and  arc  handled  as  such  bv  the  indexing  system.  Each  lime  the  word 
"adjective"  occurs  in  the  rcjmrt  it  can  Ik*  read  as  "adjective  and/or  jvarticiplc." 
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(act  actually  making  the  dictionary,  it  is  evident  that  the  separate  a  must  be  made  on 
the  basis  of  the  actual  words  occurring  in  the  phrases.  Table  4-1  shews  the  endings 
used  to  partition  canonical  forms  into  their  part-of-speech  categories. 


Table  4-1 

ENDINGS  USED  TO  PARTITION  CANONICAL  FORMS  INTO 
THEIR  PART-OF-SPEECH  CATEGORIES 


Verb-Pa  rt'd' 

Adj 

Noun 

Adj -Noun 

Verb 

*CS 

•XHA 

•OCTb 

•HHA 

•Tb 

-Cb 

•  4H  A 

-04b 

•OA 

•TH 

•■HA 

•  E  4  b 

-EE 

•■HA 

•rnfl 

-khA 

•xhA 

-UA 

-Jig 

-AH 

-OE 

-SHE 

•4HE 

-■HE 

•■HE 

•THE 

•KHE 

•XME 

•UE 


»al  If  these  endings  occur,  remove  them  before  searching 
for  other  endings.  Res  lore  them  after  suffix  examination. 


The  word  endings  are  examined  in  conjunction  with  each  other.  Thus  the  word 
Ilfc  PHOAH  4HOCT  b  is  not  a  verb  but  a  noun  because  the  verbal  («Tb)  ending  is 
contained  in  a  larger  noun  (-OCT  b)  ending.  The  larger  ending  takes  precedence. 

If  a  word  has  none  of  the  above  endings  it  is  called  a  noun.  Thus  CE  HlHHi  is  a 
noun  because  it  has  no  ending  m  the  above  categories.  The  endings  04  b  ami  -  E  4  t» 
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are  verbal  endings  as  well  as  noun  endings,  but  the  verbs  form  such  a  small  class  of 
technical  words  that  it  was  decided  to  designate  words  with  such  endings  as  nouns. 

HAHOCHTb  is  called  a  verb  only  after  determining  that  its  verbal  (*Tb)  ending 
is  not  contained  in  the  longer  (-OCT  b)  noun  ending.  flPflMOH  is  called  both  a 
noun  and  an  adjective  because  its  ending  is  ambiguous. 

4. 3  FINDING  THE  REPRESENTATIVE  WORD 

The  separation  is  made  on  the  basis  of  endings.  The  method  is  quite  accurate  though 
not  100  percent  accurate.  It  must  also  be  kept  in  mind  that  the  method  described 
below  applies  to  Russian  dictionary  phrases  in  canonical  form,  not  to  other  members 
of  the  canonical  form's  paradigm. 

Definition:  A  terminator  is  an  unambiguous  noun  or  verb. 

Rule  1:  Scan  Russian  phrase  from  left  to  right.  The  scan  halts  at  the  first 
terminator.  This  terminator  is  the  representative  word. 

Rule  2:  The  Russian  phrase  begins  with  an  ambiguous  word  or  a  string  of 

ambiguous  words.  Three  possibilities  arise  and  are  handled  as  follows: 

(a)  Call  the  first  word  a  noun  and  apply  Rule  1. 

(b)  The  ambiguous  string  has  k  words.  Call  each  of  the  k  words 
adjectives.  If  there  is  a  k  +  1  word  call  it  a  noun  and  apply 
Rule  1.  If  there  is  no  k  -i  1  word  apply  Rule  3. 

Rule  3:  No  terminator  is  encountered  and  no  ambiguous  words  arc  encountered. 

The  phrase  consists  of  an  adjective  or  a  string  of  adjectives.  The  scan 
is  terminated  by  the  last  word .  Call  the  last  word  the  representative 
word. 

The  practical  result  of  Rule  2  is  that  two  representative  words  are  selected,  two 
phrases,  and  two  linkages  to  the  English  translation.  Thus,  a  word  ending  in  'OH 
is  listed  two  times  as  a  representative  word,  once  as  a  noun,  once  as  an  adjective. 

(Examples  are  shown  in  subsection  4.4.) 

¥ 

.  \ 

A  -  i 
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In  summary  then,  the  representative  word  is  defined  by  means  of  the  three  rules  as: 

•  The  first  noun  or  verb  encountered 

•  If  no  noun  and  no  verb  encountered,  then  the  representative  word  is 
the  last  adjective  of  the  leading  adjective  string 

The  phrases  previously  listed  are  repeated  below  with  the  representative  words 
underlined.  The  coordinates  of  the  representative  word  are  easily  computed  once  the 
representative  word  itself  has  been  determined.  The  coordinates  are  listen!  in  the 
right  hand  column.  Coordinates  of  (0.0)  indicate  that  the  phrase  consists  of  a  single 


word,  namely  the  representative  word  itself. 

nOJIHA*  MOUHOCTb  PEAKUHH  HA  EAHHHUY  OEEEMA  (1.4) 

OPEHTAJIEHAB  flJlOCKOCTB  (1.0) 

♦OTOPOXflEHHE  (0. 0) 

344EKTHBH0E  CBMEHHE  flJIJI  flEJIEHH*  YPAHA  (1.11) 

3<DOEKT  nEPEHOCA  (0.1) 

KOCMHMECKHfl  (0  0) 

HHflYKTHPO  BATE  (0.0) 

HH»<fEPEHIlH  PYlOil AB  CXEMA  (10) 

HAHOCHTE  B  3ABHCH  MOCTH  OT  (0  1) 


4.4  AMBIGUOUS  ENDINGS 


Phrases  containing  -Oft  ,  HHft  and  EE  are  handle*!  differently,  as  has  been 
states!  Ik  fore. 
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nPJIMOtt  nOTOK 

STRAIGHT-THROUGH  FLOW 


npsMOfl  noTOK 

NOUN 

iipjimoA  noTOK 

ADJ  NOUN 


* 

■HXPEBOft  TOK 

BHXPEBOfl  TOK  NOUN 

EDDY  CURRENT 

BHXPEBOft  TOK 

ADJ  NOUN 


nOBOHHfl 

POLONIUM 


noJioHHa 

NOUN 

nOJIOHHfl 

ADJ 


But 


•» 
<  w 


4 


a 


6 


KAHAJI  AKTHBHOft  30HU  KaHaJI  AKTHBHOft  30HU 

CORE  CHANNEL  - - 


because  the  program  recognizes  KAHAJI  as  a  noun,  hence  the  representative  won  I 
before  it  encounters  AKTHBHOft  .  The  assignment  ol  the  correct  pa rt-o! -speech 
to  the  representative  word  is  imfxirtant;  the  indexing  program  uses  this  syntactic  inhu¬ 
mation  in  constructing  the  final  index.  The  representative  words  in  items  1.  and  (i 
it  should  be  noted,  have  l>cen  assigned  the  incorrect  part-of-spcech.  It  will  be  shown 
in  subsection  5.  It  how  this  error  is  handled  by  the  system.  Let  us  simply  note  for  now 
that  the  error  is  unavoidable  because  the  program  cannot  determine  if  an  Oil  ,  HUM 
or  EE  word  is  a  noun  or  an  adjective.  therefore  it  regards  these  endings  as  ambiguous 


!.  a  A  WORD  ABOUT  AIXlECTlVKs 


Unlike  nouns  and  verbs,  an  adjective,  it  it  is  a  representative'  word,  is  carried  tn  the 
dic  tionary  not  in  its  classic  al  canonical  form,  but  b\  its  stem.  In  addition,  it  .i 
representative  word  of  a  phrase  is  a  noun  then  adjectives  to  the  left  of  the  repre 
entalive  word  have*  had  their  endings  removed.  The  reason  tor  tins  is  discu.N-.ed  in 
subsection  a.  J.  ;i  where  the  discussion  arises  naturallv  as  part  of  the  indexing  procedure  . 
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4.6  REPRESENTATIVE  WORDS,  THEIR  COORDINATES  AND  PARTS-OF-SPEECH 
AS  AN  ARGUMENT/FUNCTION  TABLE 

After  the  DCP  has  determined  for  each  entry  the  representative  word  and  the  part-of- 
speech  and  coordinates  of  the  representative  word,  it  then  proceeds  to  construct  an 
argument/function  table  where  representative  words  make  up  the  argument  portion  of 
the  table  and  coordinates  and  parts -of -speech  the  function  portion. 

The  argument  table,  of  course,  can  contain  only  distinct  entries.  Novi  if  a  particular 
representative  word  occurs  more  than  once,  Rules  1,  2,  and  3  will  assure  that  the  word 
in  each  occurrence  has  the  same  part-of-speech.  The  coordinates  of  each  word,  how¬ 
ever,  may  differ.  Therefore,  all  sets  of  coordinates  belonging  to  identical  repres- 
sentative  words  must  be  examined  so  that  only  the  largest  coordinates  will  be  entered 
in  the  function  portion  of  the  table. 

Now  when  the  Indexer  finds  a  representative  word  in  text  and  retrieves  its  function,  it 
knows  not  only  the  representative  word's  part-of-speech.  but  also  the  size  (i.e. .  the 
number  of  words,  expressed  by  the  coordinate  values)  of  the  longest  phrases  which  it 
represents. 

In  text,  of  course,  phrases  smaller  than  the  maximum  may  occur.  Thus,  the  Indexer 
must  construct  and  seek,  for  any  given  representative  word,  all  possible  text  pit  rases 
that  surround  it  up  to  the  maximum  size  permitted  by  the  function  coordinates. 

Example  1;  The  following  phrases,  having  the  same  representative  word,  occur 
in  a  dictionary. 

(1)  method/  M  ETOfl  (0.0) 

(2)  curve  fitting  method/ M  E  TO£  nORTOH KM  K  PH  BOR  <o.  J> 

(3)  activation  method/ A KTM BAUM OHHUR  M ETOJl  (l.o) 

(4)  radiation  prospecting  method/ FARM AUM OHM UR  ME TOR  IIOHCKOB  (11) 

Argument  Function 
ME  TOR  noun.  (1.2) 


4-7 


LOCKHEED  MISSILES  A  SPACE  COMPANY 


The  coordinates  (1,2)  are  the  left  and  right  limits  of  the  largest  possible  phrases 
whose  representative  word  is  MBTOA  . 

If,  in  the  following  text  sequence,  w3  is  the  representative  word  with  coordinates, 
say,  (1,2) 


Wyi  Uf  «m»  u»  •*# 

|  W g W 0  •  •  •  • 

then  the  possible  sets  of  text  phrases  based  on  w3  and  coordinates  (1.2)  are 


.  .  .  W0W0W.W-  .... 

2  3  4  0 

(1.2) 

.  .  .  WgWgW^  .... 

(l.D 

...  w3w4w5.... 

(0.2) 

.  .  .  W2Wg  .... 

(1.0) 

.  .  .  WgW^  .... 

(0.1) 

•  •  •  Wj  »  ♦  ♦  • 

(0.0) 

In  the  first  two  phrases  the  representative  word,  w3  .  is  embedded  in  the  phrase 
while  in  the  remaining  four  phrases  w3  is  the  left  or  right  limit  of  the  ohrase. 

v  ‘perationally  it  is  useful  to  distinguish  these  two  types  of  coordinates  -  those  repre¬ 
senting  an  emljodded  representative  word  and  those  representing  a  representative  word 
at  the  phrase's  limits.  The  reasc.i  for  this  distinction  will  be  made  cler.r  by  the  next 
example.  Supjxoso  the  following  two  dictionary  phrases  have  the  same  representative 
word. 

Example  2 

(l)  powder -di  if  ruction  method/  M  ETOfl  flH+FAKUHH  HA  IlOPOIlKfc 

•*) 

1 2 »  radiation  prospecting  method/ PAflHAllHOHH M ft  METO^  flOHCKOB 
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If  we  do  not  distinguish  between  the  two  kinds  of  coordinates  we  get,  as  before 


Argument  Function 

METOA  noun,  (1,3) 

and  again  if  w3  -  METOA  in  the  sequence  wiw2w3w4  •  •  •  •  possible  text  phrases 


Mr  Ml  Ml  til  111 

•  •  *  2  3  4  5  6  - 

(1.3) 

. . .  w„w„w  .w.  .... 

2  3  4  5 

(12) 

. . .  w0w„w ..... 

2  3  4 

(1.1) 

.  .  .  W.,Wg  .... 

(1.0) 

. . .  w.w-w-w-  .... 

3  4  5  6 

(0.3) 

...  w3w4w5.... 

(0.2) 

. . .  w„w ..... 

3  4 

(0.  1) 

.  . .  w3  . . . . 

(0.0) 

The  first  two  phrases  (coordinates  (1.3)  and  (1.2))  are  combinations  which  do  not  occur 
in  the  dictionary  and  which  need  never  be  formed  by  the  Indexer  if  the  functions  (0. 3) 
and  (1. 1)  are  kept  separate. 

Therefore,  the  argument /function  pairs  are 

Argument  Function 

METOA  noun.  (0.3).  (1. 1) 

leading  to  the  following  text  phrases  -  three  less  than  in  the  previous  case  where 
coordinates  were  simply  merged. 
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. . .  w2w3w4  •  •  •  • 

(1.1) 

.  .  .  WgW^WgWg  .... 

(0,3) 

W  III  w 

*  *  *  •  •  •  • 

(0,2) 

‘  ■  •  W3W4 - 

(0,1) 

■  •  •  •  *  *  • 

(0,0) 

The  Indexer  forms,  and  searches  the  dictionary  for,  the  largest  phrases  first.  When 
a  text  phrase  is  found  to  have  a  match  in  ‘he  dictionary,  the  Indexer  terminates  potential 
phrase  construction  based  on  the  particular  representative  word  it  is  operating  with. 

Since  the  largest  phrases  are  found  first,  this  procedure  ensures  against  the  double 
indexing  of  nested  phrases.  Thus,  "curve  fitting  method"  is  the  indexed  item,  not 
"method." 

4.7  REPRESENTATION  OF  PHRASES 

We  have  discussed  selection  of  the  representative  word.  We  have  also  discussed 
coordinates  and  have  shown  how  coordinates  of  different  phrases  having  the  same 
representative  word  are  merged.  Now  we  will  show  how  the  Russian  phrase  itself  is 
represented  in  the  machine  dictionary. 

Ii  would  be  extremely  cumbersome  to  work  with  an  actual  list  of  Russian  phrases. 

!  gying  to  matc-h  (dements  of  such  a  list  with  another  list  computed  from  the  text.  The 
Russian  phrase  as  a  phrase  -  as  a  string,  that  is,  oi  words  -  does  not  occur  in  the 
machine  dictionary.  It  is  represented  instead  by  its  logical  sum.  This  means  of 
argument  compression  represents  the  phrase  as  a  binary  number  occupying  one 
machine  word.  Appendix  C  shows  how  logical  sums  are  formed. 

4.  S  LOGICAL  SUMS  OF  RUSSIAN  PHRASES  AND  ENGLISH  TRANSLATIONS  AS  AN 
ARGUMENT/FUNCTION  TABLE 

Tihs  argument /function  table  consists  of  the  logical  sums  of  Russian  phrases  in  the 

argument  portion  and  the  locations  of  the  English  translations  in  the  function  portion.  -  A 

's  t 
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The  sums  are  arranged  in  ascending  order  so  that  they  are  susceptible  to  a  binary 
search. 

It  is  possible,  though  certainly  not  probable,  that  two  distinct  phrases  could  produce 
identical  logical  sums.  A  test  case  of  8, 500  English  words  and  phrases  produced  only 
50  duplicates  and  most  of  these  were  caused  by  two-word  phrases  in  inverse  order. 
For  example,  index  arithmetic  and  arithmetic  index  produced  the  same  logical  sum. 

It  is  not  known  whether  such  a  compression  device  has  ever  been  used  with  Russian 
phrases. 

4.9  FORMAT  OF  THE  COMPUTER  DICTIONARY 

We  are  now  in  a  position  to  describe  the  structure  of  the  machine  dictionary. 

First,  for  each  letter  of  the  English  alphabet  occurring  in  the  manual  dictionary, 
three  files  of  information  are  created  as  follows: 

FILE  A 

An  argument/function  table.  The  arguments  are  representative  words.  The  functions 
are  the  parts -of-speech  and  the  coordinates  of  the  representative  words. 

FILE  B 

Also  an  argument/function  table.  The  arguments  are  the  logical  sums  of  Russian 
phrases.  The  functions  are  the  location  in  FILE  C  of  the  English  translations  of  those 
Russian  phrases  whose  logical  sums  are  contained  in  the  argument  portion  of  this  file. 

FILE  C 

Not  an  argument/function  table.  Simply  a  list  of  English  phrases,  the  Russian  equiv¬ 
alents  of  which  have  been  logically  summed  and  stored  in  the  argument  portion  of 
FILE  B. 

This  threefold  structure  exists  for  each  letter  of  the  English  alphabet  which  occurred 
in  the  phrase  dictionary.  If  25  letters  appeared  there  will  be  75  files. 
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A  schematic  picture  of  ‘he  dictionary  is  as  follows  where  FILE  A  (A)  means  the  A  file 
for  the  English  letter  A. 


FILE  A  (A) 
FILE  B  (A) 
FILE  A  (B) 
FILE  B  (B) 
FILE  A  (C) 
FILE  B  (C) 


FILE  A  (Z) 
FILE  B  (Z) 
FILE  C  (A) 
FILE  C  (B) 
FILE  C  (C) 


FILE  C  (Z) 


Except  for  a  coming  discussion  in  subsection  5.  2.  3,  where  the  reasons  for  carrying 
certain  adjectives  in  a  stem  form  are  discussed,  the  dictionary  system  used  by  the 
Indexer  has  now  been  completely  described. 


In  summary,  the  dictionary  system  is  actually  composed  of  two  distinct  types  of 

dictionary  (Files  A  and  B)  and  an  English  buffer  (File  C).  R  .  the  reverse  inflection 

s 

algorithm  for  words,  and  R  ,  the  reverse  inflection  algorithm  for  phrases,  operate 

P 

in  conjunction  with  Files  A  and  B.  respectively,  because  the  argument  portions  of 
Files  A  and  B  correspond  to  Dg  (canonical  dictionary  of  words)  and  (canonical 
dictionary  of  phrases),  respectively. 
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Section  5 
THE  RAW  INDEX 


5.1  THE  ALGORITHM 

The  algorithm  which  produces  English  translations  for  intervals  of  ordered  Russian 
words  (i.e. ,  the  Raw  Index)  is  described  in  Appendix  D  by  means  of  a  mathematical 
notation  which  can  be  read  as  a  flow  chart.  The  algortihm  will  now  be  described 
verbally  in  conjunction  with  an  example. 

•  A  page  of  text  is  read  into  the  computer's  core. 

•  A  text  word  w.  is  successfully  transformed  by  R  into  its  canonical  form  ( i 

i  s 

is  the  position  of  the  word  on  the  page).  This  can  only  happen  if  the  canonical 
form  of  w.  ,  w  ,  is  a  representative  word  (i.e. .  if  it  is  in  the  set  of  A  files). 

•  The  Indexer  retrieves  w's  coordinates  (£ .  r)  and  part -of -speech.  With  the 
coordinates  the  Indexer  computes  the  set  of  phrases  w.  ^. . .  w  w,+^. . . . 

w.  (f  ^  k  <  o  ,  o  <  m  <  r)  .  Note  that  w's  canonical  form,  not  w  itself 

l-'-m  — 

occurs  in  each  element  of  the  phrase  set. 

•  For  each  phrase  in  the  set  of  phrases,  a  logical  sum  is  computed  (Appendix  C). 
Then  for  each  sum  (starting  with  the  one  representing  the  longest  word -string), 
the  Indexer  seeks  a  match  in  that  B  file  corresponding  to  the  A  file  where  w 
was  located. 

•  If  a  match  is  found,  the  Indexer  has  in  fact  transformed  a  text  phrase  to  its 
canonical  form.  (Sec  subsection  5.  2  for  details  of  this  transformation. )  Now 
the  logical  sum  is  simply  a  Russian  phrase  in  a  compressed  form.  The 
English  equivalent  of  this  Russian  phrase  is  now  retrieved  from  FILE  C 
corresponding  to  FILE  B  where  the  sum  match  was  found. 
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•  The  English  translation,  the  part-of-speech  of  the  Russian  x'epresentative 
word,  and  the  left  and  right  sentence  limits  of  the  successfully  transformed 
Russian  phrase  are  now  stored  in  a  Raw  Index  Matrix  in  a  row  corresponding 
to  the  position  of  w  on  the  original  page.  In  addition,  Russian  prepositions, 
conjunctions,  and  other  high-frequency  words  are  stored  in  the  Raw  Index 
Matrix  in  rows  corresponding  to  their  positions  on  the  page. 

•  The  Final  Index  (Section  6)  is  constructed  from  elements  of  the  Raw  Index 
Matrix. 

5.2  TRANSFORMATION  OF  TEXT  PHRASE  TO  CANONICAL  FORM 

The  selection  of  a  representative  word  from  a  phrase  and  the  computing  of  its  coordi¬ 
nates  can  be  considered  a  device  for  determining  phrase  limits  in  text,  for  determining, 
that  is.  text  phrases  to  be  used  as  input  to  R  . 

Now  we  will  discuss  the  transformation  of  the  text  phrase  to  its  canonical  form.  We 
will  discuss,  that  is.  the  reverse  inflection  algorithm  for  phrases.  R  . 

First,  let  us  assume  that  the  text  phrase  which  has  been  isolated  is  a  member  of  some 
paradigm  whose  canonical  form  is  contained  in  the  dictionary.  (If  this  is  not  tlu;  case, 
then  R  will  fail  and  the  Indexer  will  pass  on  to  the  next  potential  text  phrase. ) 

The  text  phrase  and  the  dictionary  phrase  contain,  at  most,  three  components: 

•  The  representative  word 

•  The  word  string  following  the  representative  word 

•  The  word  string  preceding  the  representative  word 

Either  word  string  may  be  empty,  but,  in  the  general  form  considered  here,  both 
exist.  The  transformation  of  the  text  phrase  to  its  canonical  form  can  be  considered 
as  the  transformation  of  each  text  component  to  its  dictionary  component. 
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5. 2.  i  Transformation  of  Reoresentativo  Word 

The  text  representative  word  has  already  been  transformed  to  its  canonical  form  by 

R  .  This  was  necessary  to  obtain  the  phrase  coordinates  in  the  first  place, 
s 

5. 2.  2  Words  Following  Representative  Word 

These  words  do  not  have  to  be  transformed.  An  examination  of  the  list  on  page  3-2 
shows  that  the  configuration  of  these  words  with  respect  to  each  other  and  with  respect 
to  the  representative  word  is  fixed.  Further,  these  words  are  contiguous  (as  are  all 
the  words  in  the  phrase,  for  that  matter).  Finally,  the  cases  of  these  words  are  fixed. 
Thus,  the  words  following  the  representative  word  occur  in  text  exactly  as  they  occur 
in  the  canonical  form  of  the  phrase. 

5.2.3  Words  Preceding  Representative  Word 

The  only  portion  of  a  phrase  which  may  inflect  according  to  its  use  within  the  sentence 
is  the  representative  word  and  words  preceding  it. 

The  representative  word  has  been  transformed  by  R,  .  Now  we  will  examine  the 

s 

preceding  words. 

We  stated  in  Section  2  that  the  inverse  inflection  algorithm.  R ,  .  transformed  verbs 

s 

to  the  infinitive,  nouns  to  the  nominative  singular,  and  adjectives  to  the  masculine 
nominative  singular. 

We  will  now  show  that  transforming  text  adjectives  to  the  masculine  nominative 
Miigular  will  not  necessarily  lead  to  the  proper  dictionary  phrase,  and  that,  conse¬ 
quently.  the  adjective-transforming  routine  of  R^  must  be  slightly  altered. 

Suppose  the  following  phrase  is  in  the  computer  dictionary. 

(1)  3<T * E ICTH  BHO E  C  E  M  gJiH  E  /  E FFECTIVE  CROSS-SECTION 
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CEHEHHE  ,  the  representative  word,  is  underlined.  Now  suppose  the  following 
phrase  occurs  in  text  in  the  instrumental  case 

(2)  ...3WEKTHBHUM  CEMEHHEM.... 

which  the  reverse  inflection  algorithm,  R  ,  operating  on  each  word  transforms*  to 

S 

(3)  ...3*<f>EKTHBHMft  CEMEHHE.... 

Phrase  3  is  not  the  same  as  phrase  1,  the  dictionary  phrase.  (In  fact,  phrase  3  is 
grammatically  incorrect  -  a  masculine  adjective  modifying  a  neuter  noun, )  In  any 
event,  the  English  translation  "effective  cross-section"  would  not  be  regarded  as  a 
potential  indexable  phrase  because  the  Russian  equivalent  is  being  incorrectly  con¬ 
structed  from  text. 

The  difficulty  here  arises  because  adjectives  agree  in  case,  number,  and  gender  with 
the  nouns  they  modify.  In  a  dictionary  entry  an  adjective  modifying  a  neuter  or 
feminine  noun  is  itself  in  the  neuter  or  feminine  nominative  form.  But  the  reverse 
inflection  algorithm ,  R  ,  operating  on  text  words  transforms  all  adjectives  to  the 
masculine  singular  nominative  form.  The  result  is  that  the  Indexer  constructs  an 
incorrect  hybrid  noun  unit  -  the  adjective  in  the  masculine  singular  nominative  and  the 
noun  in  the  feminine  or  neuter  singular  nominative.  A  correct  match  with  the  noun 
phrase  c  ontained  in  the  dictionary  cannot,  of  course,  be  made.  This  difficulty  lias 
been  resolved  by  altering  the  procedure  described  in  subsection  4.  3.  Under  this 
alteration,  adjectives  preceding  a  representative  word  which  is  a  noun  have  their 
adjectival  endings  removed.  If  the  phrase  consists  of  an  adjective  or  a  string  of 
adjectives,  then  the  adjectival  endings  are  removed  from  each  word.  This  means 
that  by  the  time  the  Dictionary  Creation  Program  has  found  the  representative  word. 


*  We  must  assume  for  this  example  that  3$<frE  KTH  BHbl  ft  is  in  the  dictionary.  It 
it  were  not.  then  Us  could  not  transform  the  instrumental  text  form  to  the  nomina¬ 
tive  (dictionary)  form. 
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any  adjectives  preceding  it  have  had  their  endings  removed.  If  the  representative 
word  is  itself  an  adjective,  then  its  ending  also  is  removed.  Thus  the  phrase 
30OEKTHBHOE  CEHEHHE  is  contained  in  the  dictionary  as 

344EKTHBH  CEHEHHE 

The  Indexer,  when  it  creates  phrases  from  text, removes  the  same  endings  from  the 
proper  adjectives. 

The  reverse  inflection  algorithm  (Appendix  B)  is  so  structured  that  when  it  correctly 
creates  a  canonical  form  it  knows  the  part-of-speech  of  that  form.  Now.  if  the  Indexer 
transforms  a  text  word  to  a  canonical  form,  and  that  form  is  a  noun,  and  its  coordi¬ 
nates  indicate  a  left  limit  other  than  zero,  then  the  Indexer,  when  scanning  left  in  text 
to  form  the  text  phrase,  must  remove  adjectival  endings  from  words  lying  to  the  left 
of  the  text  representative  word.  Thus 

...34>4>EKTHBHbi  M  CEHEHHEM.... 


becomes 


...344EKTHBH  CEHEHHE.... 


which  is  precisely  the  way  the  phrase  occurs  in  the  dictionary.  Note  that  3<l^EKTHBH 
need  not  appear  in  the  dictionary  as  a  representative  word,  iRvause  it  is  contained  in 
the  dictionary  in  the  logical  sum  representing  the  phrase  34>$>EKTHBH  CEHEHHE  . 
The  assumption  here,  of  course,  is  that  an  adjective  string  lying  to  the  left  of  a  noun 
and  contiguous  with  it  modifies  the  noun  -  a  dangerous  assumption  in  literary  Russian, 
but  sale  enough  for  scientific  text. 
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5.3  COMPUTER  REPRESENTATION  OF  ADJECTIVES 


The  phrases  of  page  3-  2  are  repeated  below.  Adjectival  endings  have  been  removed 
from  adjectives  preceding  the  representative  word. 

IIOJSH  MOKHQCTB  PEAKUHH  HA  E/MiHHUY  051, EMA 

0P5HTAJ15H  nJIOCKOCTb 

♦OTOPOXJIEHHE 

3»»EKTHBH  CEMEHHE  RSifL  flEJIEHHE  y PAHA 

3»»EKT  nEPEHOCA 

K0CMH4EC 

HHflYKTHPOBATB 

AHttEPEHUH py»H  CXEMA 

HAH0CHT5  S  3ABHCH  MOCTH  OT 


Double  dictionary  entries  are  constructed  for  phrases  beginning  with  a  string  of 
ambiguous  words: 


flPflMOH  nOTOK 

riPAMOft  nOTOK  NOUN 

STRAIGHT-THROUGH  FLOW 

np*M  nOTOK 

ADJ  NOUN 


BM  X  P  E  BOH  TOK 

EDDY  CURRENT 


UOJIOHHM 

I  *01.0X1  UM 


BHX  PE  BOH  TOK 

NOUN 

BHX  P  E  B  TOK 

ADJ  NOUN 

noJioHHrt 

NOUN 

no  bo 

ADJ 
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We  can  now  sec  more  clearly  the  reason  for  creating  double  entries  for  OH  ,  EE  , 
and  HHft  phrases.  One  of  the  two  representative  words  -  the  form  with  the 
adjectival  ending  -  will  never  be  located  in  the  dictionary  because  if  it  or  a  member 
of  its  paradigm  occurs  in  text,  the  adjectival  ending  will  be  removed  prior  to  the 
dictionary  search.  And  removing  the  adjectival  ending  leads  to  the  correct  (i.e. .  the 
dictionary)  representation  of  the  adjective.  The  incorrect  representative  word  is  thus 
a  "wasted"  entry,  the  price  paid  for  making  the  DCP  automatic. 

5.4  INDEXING  FAILURES:  HOW  THEY  CAN  BE  CORRECTED 

Certain  configurations  of  text  words  will  not  be  indexed  properly.  It  is  not  believed 
that  these  tyi>es  of  configuration  occur  often  enough  to  be  considered  a  serious  problem 
but  they  must  be  mentioned  for  completeness.  In  any  case,  they  can  be  corrected  by 
making  the  DCP  semi-automatic  instead  of  fully  automatic. 

5. 4. 1  Representative  Word  a  Plural  Noun 

The  reverse  inflection  algorithm.  Ft  .  transiorms  text  nouns  to  the  nominative 

s 

singular  and  attempts  to  find  a  match  in  the  dictionary  set  of  representative  words. 

Now  if  a  noun  in  the  set  of  representative  w’ords  is  in  the  nominative  plural,  e.  g. . 

JIY4H  in  KOCMH4ECKHE  J1Y4H/COSMIC  RAYS 


a  member  of  the  noun's  paradigm  occurring  in  text  will  be  transformed  to  the  singular 
form  which  does  not  occur  in  the  dictionary,  hence  no  match  c:ui  be  made. 


This  error  can  l>e  corrected  by  altering  the  reverse  inflection  algorithm  so  that  it 
forms,  for  nouns,  both  the  nominative  singular  anti  nominative  plural  forms. 


Uniortunately ,  .->uch  a  change  will  also  increase  the  processing  time.  It  should  be 
t»ointed  out  that  the  great  majority  of  nouns  in  a  piece  of  text  will  not  be  elements  of 
the  paradigm  of  any  representative  word.  Nevertheless,  each  noun  must  be  processed 
by  l{^  to  establish  that  it  is  or  is  not  a  member  of  a  representative  word  »iarudigm. 
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Changing  the  algorithm  E  so  that  it  forms  the  nominative  plural  would  require  the 
reprocessing  of  a  given  word  in  its  plural  forms,  if  processing  of  the  singular  forms 
fails . 

5. 4. 2  Representative  Word  a  Noun  in  Adjectival  Form 
Some  Russian  nouns  are.  morphologically ,  adjectives 

(CTOAOBA* /LIVING  ROOM  .  AAHHUE  /DATA) 

If  such  a  form  is  a  representative  word,  it  will  have  its  adjectival  ending  removed 
(subsection  5.2.3).  A  member  of  its  paradigm  occurring  in  text  will  have  its  ending 
removed.  A  match  will  be  made,  and  the  English  translation  will  be  stored  in  the 
Index  Matrix.  But  it  will  have  the  wrong  part-of-speech  -  adjective  instead  of  noun. 
The  English  translation  will  thus  appear  correctly  in  the  simple  index,  but  quite  likely 
will  appear  incorrectly  (if  it  appears  at  all)  in  the  complex  index.  (The  rules  for 
forming  the  complex  index  could  undoubtedly  be  altered  to  satisfactorily  handle  such 
nouns.  This  report,  however,  describes  the  system  as  it  now  stands. ) 

5.4.3  Miscellaneous  Forms 

Occasionally,  though  not  often,  miscellaneous  forms  occ  ur  in  the  dictionary  such  :«s  a 
l>rc|>osition  followed  In  a  noun,  adjective  followed  by  prepos  it  tonal  phrase,  or  a  verb 
preceded  by  an  adverb.  Their  occurrence  is  sufficiently  rare  that  as  yet  no  sjiecial 
provision  lias  Ihvii  made  for  handling  them  by  DO*. 

SEMI-At  roMATie  dictionary 


The  errors  descrtlH'd  in  the  Indexing  system  can  ail  Ik*  traced  t<>  the  desire  to  make 
the  Dictionary  Creation  Program  fully  automatic.  Relaxing  this  requirement  allowing 
the  IH.'P  to  work  m  tandem  with  someone  i*ossessmg  a  small  knowledge  of  Uu>>iau 
would  we  Indicvc.  eliminate  the  errors  discussed. 
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DCP  would,  In  this  version,  be  a  two-pass  system.  The  first  pass  would  create  the 
type  of  dictionary  that  has  been  described.  Suspicious  entries  -  those,  for  example, 
whose  representative  words  are  potential  plural  nouns  -  would  be  printed  out  with 
their  English  translations.  A  human  being  would  then  examine  the  print -out  and  make 
necessary  changes.  (Thus,  a  plural  noun  would  be  changed  by  the  human  to  the 
singular  form  so  that  R  would  successfully  operate  upon  a  member  of  its  paradigm 
occurring  in  text. ) 

The  second  pass  would  simply  merge  the  corrected  entries  with  the  dictionary  created 
on  the  first  pass. 

5.6  EXAMPLE  OF  ENTRY  SELECTION  FOR  THE  RAW  INDEX 


.  ..  (}  4ACTHL1  *B J1H3H  PAAHOAKTHBHUX  HCTOHHHKOB  IIPEAnOAOXH JI 

...i-4  i-3  i-2  i-1  ^  i  i+1 


{HCTOMHHK  .  i} 


FILE  A 


/ 


HCTOMHHK 

noim .  (1.0) 

{i  .  noun  .  (l.'O)  .  HCTOMHHK} 

/ 

{i  noun  HCTOMHHK  .  PAAHOAKTH  BHMX  HCTOMHH  KOB  HCTOMHH  KOB; 

/ 


i  noun  .  PAAHOAKTHBH  HCTOMHHK  HCTOMHHK} 


/ 


i  noun  .  775141651767  .  63 2-465-46044. >} 

/ 


noun  .  Radioactive  source  .  i-1  i} 


FILE  H 
632403460445 
Source 


Finally 


773 14 10. >1767 
Itndioactixc  Source 


Eli- 1  .  »)  < Radioactive  source  .  noun) 
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Section  6 
FINAL  INDEX 


3. 1  GENERAL  DISCUSSION 

After  preliminary  processing,  the  Indexer  is  ready  to  construct  the  index  items  from 
the  Raw  Index.  The  Raw  Index  can  be  viewed  as  a  matrix,  M  .  M  ,  at  this  point,  is 
a  blend  of  Russian  high-frequency  words,  English  phrases  constructed  from  certain 
Russian  words  in  the  original  sentence,  and  the  parts  of  speech  of  those  Russian 
words.  The  positions  of  the  Russian  words  and  of  the  English  phrases  with  respect 
to  each  other  in  the  original  sentence  are  preserved  in  M  . 

The  Indexer  can  produce  two  types  of  Final  Index:  a  Simple  Index  and  a  Complex  Index. 

6.2  SIMPLE  INDEX 

This  is  a  simple  listing,  alphabetically  arranged  and  wUh  duplicate  entries  eliminated, 
of  the  English  phrases  in  the  raw  index  and  the  page  numbers  on  which  they  occur. 

There  is  no  cross  indexing. 

6.2  COMPLEX  INDEX 

The  Complex  Index  is  also  formed  from  the  Raw  Index.  The  information  contained  in 
the  Raw  Index  —  parts  of  speech  of  Russian  representative  words,  sentence  limits  of 
the  original  Russian  phrase,  and  English  translation  of  Russian  phrase  -  allow  index 
entries  to  be  constructed  using  syntactic  information.  The  Complex  Index  is  also  cross- 
referenced.  Index  items  are  selected  by  examining  the  first  column  of  the  Raw  Index 
Matrix.  This  column  contains  part-of-speech  information  and  original  sentence  limits 
o!  Russian  phrases. 


6  -1 


LOCKHEED  MISSILES  &  SPACE  COMPANY 


Index  items  are  constructed  using  the  syntactic  building  blocks  of  noun,  adjective, 
preposition,  and  verb.  Index  items  are  defined  below  in  terms  of  this  syntactic 
information.  To  avoid  cluttering  up  the  notation,  the  abbreviations  used  will  be  taken 
to  mean  the  English  translation  of  the  part  of  speech  indicated.  Thus  n  does  not 
mean  a  noun,  but  a  particular  English  phrase  behaving  as  a  noun.  Similarly 


but 


aj  -  English  phrase  behaving  as  an  adjective 
v  =  English  phrase  behaving  as  a  verb 
pr  =  English  word  behaving  as  a  preposition 

pr*  -  a  Russian  preposition 


pr*  is  a  Russian  preposition  which  cannot  be  translated  with  a  high  degree  of  accuracy. 
Some  prepositions  can  be  translated  with  a  reasonable  degree  of  accuracy.  Their 
translation  is  denoted  by  pr  . 


The  following  prepositions  are  being  translated. 


Russian  Preposition 

Pr 

B 

in 

AA  a 

for 

AO 

to 

K 

to 

MEXAV 

between 

0 

about 

c 

with 

nOCAE 

after 

Wo  also  define  a  noun  unit  (nut)  as  follows: 


n 


nut 


aJ 


ajkn 


(i  -2 


LOCKHEED  MISSILES  8r  SPACE  COMPANY 


T 


•V.. 


In  the  descriptions  to  follow  it  must  be  remembered  that  aj  ,  v  ,  n  ,  pr  ,  and  nut 
represent  English  words  and  phrases ,  and  pr*  represents  a  Russian  word. 


The  structure  of  the  index  items  can  be  shown  conveniently  by  a  tree  structure.  A 
noun  unit  appears  at  the  top  of  the  tree.  Nodes  of  the  tree  below  the  top  node  represent 
English  phrases  lying  to  the  right  of  the  leading  phrase.  Adjacent  nodes  on  the  tree 
represent  contiguous  phrases  in  the  sentence.  A  branch  at  a  given  node  indicates  that 
the  possibilities  indicated  may  follow  the  phrase  represented  by  the  branch  node.  A 
continuous  line  drawn  from  the  top  node  through  lower  nodes  gives  the  structure  of  an 
index  item  except  that  if  pr*  occurs  in  a  node  the  index  item  terminates  at  the 
previous  node.  If  two  connected  nodes  are  each  a  nut,  then  the  English  preposition 
"of"  appears  on  the  connecting  line  to  indicate  that  it  is  to  be  inserted  between  the  two 
noun  units. 


Example. 
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Index  items  for  rightmost  branch  are: 

•  scattering  cross  section  of  mesons  of  high  mass  from  deuterium 

•  deuterium,  scattering  cross  section  of  mesons  of  high  mass  from 

Index  items  for  sub-branch  of  rightmost  branch  are: 

•  scattering  cross  section  of  mesons  from  light  nuclei 

•  light  nuclei,  scattering  cross  section  of  mesons  from 

Index  items  for  leftmost  branch  are: 

•  scattering  cross  section  for  photons  of  high  energy 

•  high  energy,  scattering  cross  section  for  photons  of 

Index  items  for  subranch  of  leftmost  branch  are: 

•  scattering  cross  section  for  photons  in  dense  air 

•  dense  air,  scattering  cross  section  for  photons  in 

The  second  index  item  in  each  case  is  the  cross-indexed  entry,  the  leading  noun  unit 
representing  the  terminal  node. 


If  the  leading  noun  unit  contains  adjectives  as  in  this  example,  then  deeper  cross 
indexing  is  possible,  leading  to  the  following  additional  entries: 


•  cross  section,  scattering,  of  mesons  of  high  mass  from  deuterium 

•  cross  section,  scattering,  of  mesons  from  light  nuclei 

•  cross  section,  scattering,  for  photons  of  high  energy 

•  cross  section,  scattering,  for  photons  in  dense  air 
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Section  7 
CONCLUSIONS 


The  algorithm  which  produces  English  translations  for  sequences  of  Russian  text  words 
can  be  viewed  as  the  basic  algorithm  which,  in  the  system  described  here,  is  being 
used  to  produce  an  index. 

This  basic  algorithm  has  several  other  possible  uses. 

7.1  EXTRACTING 

Phrases  occurring  in  a  technical  phrase  dictionary  are,  by  definition,  descriptors 
critical  to  an  understanding  of  a  given  scientific  field.  Since  it  isolates  such  phrases 
in  text,  the  basic  algorithm  can  be  used  to  extract  Russian  sentences,  paragraphs,  or 
even  pages  from  a  larger  body  of  technical  text. 

7.2  TRANSLATION 

The  basic  algorithm  and  the  dictionary  upon  W'hich  it  operates  could  be  used  as  a  closed 
subroutine  within  a  larger  Russian -English  translation  system.  Such  a  subroutine 
would  produce  translations  of  a  sequence  of  Russian  words  which  occur  in  a  piece  of 
text.  The  English  translation  itself  would,  of  course,  have  to  be  inflected  to  conform 
to  the  syntactic  use  of  the  phrase  within  the  sentence. 

7.2  RETRIEVAL 

The  basic  algorithm  gives  the  capability  of  creating  a  unique  information  retrieval 
system  -  one  which  accepts  English  queries  and  addresses  these  queries  to  files  of 
Russian  articles,  or  more  accurately,  to  files  of  indexes  of  Russian  articles. 
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The  Russian  Retrieval  Program  follows  directly  in  conception  from  the  Russian  - 
English  Indexing  System  described  in  this  report  and  will,  in  fact,  use  most  of  the 
computer  programs  used  by  the  Indexing  System. 

Retrieval  of  articles  processed  by  the  Indexing  System  appears  to  be  simple.  Russian 
articles  so  processed  have  been  deeply  indexed  in  English.  If  a  user  seeking  infor¬ 
mation  from  a  file  of  such  deep  indexes  uses  the  same  terminology  (i.  e. ,  the  manual 
version  of  the  computer  dictionary)  as  was  used  to  create  the  index,  then  a  matching 
process  -  user's  phrases  versus  index  -  in  combination  with  the  logical  AND  and  OR 
operations  will  enable  the  user  to  address  long  English  queries  to  the  file.  In  effect 
this  will  lead  to  retrieval  by  English  queries  of  Russian  technical  material. 

The  Indexing  and  Retrieval  applications  have  been  discussed  with  the  intention  of 
deriving  English  information  from  Russian  text.  The  logic  involved,  however,  applies 
to  English  as  well.  Thus  the  Russian-English  programs  with  minor  alterations  may 
be  used  to  index  (Ref.  7), extract,  and  retrieve  English  technical  material. 
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Appendix  A 

THE  FIRST  AND  SECOND  VERB  CONJUGATIONS 


First  Conjugation 


lmperfective  Perfective 

I.  Infinitive: 

qmaTb  npoMHTafb  to  have  read 

to  read,  be  reading 

II.  Indicative: 


Present  Tense 


I  read,  a?n  reading 

a  MMTaio 

TbI  MHTaeuib 

oh,  ona,  oh6  MHTaeT  None 

MU  MHTaeM 

bu  MHiaeTe 

OHH  HHTaiOT 


Past  Tense 

I  haw,  had  read 

u  npoMHTaJi,  aa,  jio 
tw  noo-inT.ia,  aa,  j»o 

OH  npOMHTiU! 
ohu  npoHHTnaa 
oho  npoHHTaao 

MW,  BU,  OHH  npOSHTaflH 
Future  Tfnsc 


I  read,  was  reading 

ft  MHTjJI,  aa,  so 
Tu  MHTaa,  aa,  jio 
oh  MHTaa 
OHa  mtTa.ia 
OHO  HHTa.IO 

MU,  BU  OHH  MHTa-111 


I  shall  read,  be  reading 

r  6yay  wntaTb 

tn  6yneiub  MHTaTh 

oh,  ona,  oho  6yaeT  HHTarb 

MN  6ya«M  MHTatb 
bu  6yAeTe  MHTatb 
on.!  6yayt  MHTatb 


I  shall  have  read 

fl  npOHHT.iK) 

tu  npoHHTaeuib 

OH,  OH3,  OHO  npOHHTiCT 

mu  npoHHTaeM 
bu  npoMirraere 
OHH  npOSMTaiOT 
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Impcrfective  Perfective 

III.  Subjunctive  (conditional): 

Conjugated  exactly  like  the  past  tense  of  the  indicative  mood 
with  the  addition  of  particles  6u  or  6: 

a  wraji,  aa,  ao  6n  (6)  etc.  h  npowiTaji,  ^a,  no  6u  (0)  etc. 

I  should  read,  be  reading,  I  should  have  read 
should  have  been  reading 


Imperfcctive  Perjective 


b.  Passive-. 


Present  Tense 

Long  form:  MHTacMMft 
Short  form:  MIITitCM 

which  is  being  read 


None 


Past  Tense 

Long  form:  m'lTamibiA  npowiVraiiHhiA  which  has,  hud 
Short  form:  MtVraH  ftpcm'iTaii  been  read 

which  was  read 

(Other  past  passive  participle  endings  are :  long -Tbifl,  sliort  -T.) 
VII.  Passive: 

The  passive  is  constructed  by  moans  of  the  short  passive  parti- 
ciplc  forms,  present  or  past  (see  directly  above)  ;  also  by  means 
of  the  reflexive  form^ 


Second  Conjugation 


III.  Subjunctive  (conditional): 

Conjugated  exactly  like  the  p'-'t  tense  of  t  in./n-iifiiv  mood 
with  the  addition  of  paficles  6m  (0): 

jj  Kypt'ui,  aa,  ao  Gw  (6)  etc.  h  bmkvp>m,  an,  .to  6  m  <<*>  etc 

I  should  smoke,  be  stimking,  I  sh.ould  have  smoked 

should  have  been  smoking 
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hnpcrfcctive 


Perfective 


IV.  Imperative: 


Lit 


Kyp»« 

KypiiTc! 


smoke! 


BUKypti! 

BbiKypure! 


V.  Adverbial  participles: 

Present  Tense 
Kypri  smoking,  while 

smoking 

Past  Tense 

KypumuH  while  (I,  etc.)  BWKypiiBiun 
KypHB  was  smoking  BHKypiiB 

VI.  Participles: 

a.  Active : 

Present  Tense 

Kypaimifl  one  who  is 
smoking 


smoke!  finish 
smoking! 


None 


having  smoked 


None 


Past  Tense 

KypHBimiii  one  who  was  BbiKypiiBtmiil  one  who  has, 
smoking  had  smoked 

b.  Passive-. 


Present  Tense 
Long  form:  KypUMWfi 

Short  form:  KypHM  None 

which  is  being  smoked 


Past  Tense 

Long  form :  Kypeimi.in  BMKypemiwfi  which  has,  had 
Short  form:  Kypcii  BMKYpCH  been  smoked 

which  was  smoked 

(Other  past  passive  participle  endings  are  long  -Tblrt,  short  -T.) 
VII.  Passive: 

The  passive  is  constructed  by  means  of  the  short  passive  parti¬ 
ciple  forms,  present  or  past  (sec  directly  above) ;  also  by  means 
of  the  reflexive  form. 
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Appendix  B 

REVERSE  INFLECTION  ALGORITHM 


The  algorithm  operates  on  a  dictionary  which  contains  classically  defined  canonical 
forms:  nominative  singular  of  nouns,  nominative  singular  masculine  gender  for 
adjectives,  and  the  infinitive  for  verbs  and  participles. 

A  word  encountered  in  text  has  a  terminal  string  of  letters  removed  by  the  algorithm 
and  a  new  terminal  string  added  to  form  a  "pseudo-word."  The  pseudo-word  is  an 
attempt  on  the  part  of  the  algorithm  to  construct  the  text  word's  canonical  form.  If 
the  pseudo-word  does  in  fact  exist  in  the  dictionary,  the  algorithm  proceeds  to  examine 
the  next  word. 

If  the  pseudo-word  does  not  exist  in  the  dictionary,  a  new  pseudo-word  is  constructed 
(terminal  string  removed,  new  terminal  string  added).  The  process  continues  for  a 
given  word  until  a  true  canonical  form  is  constructed  or  until  all  of  the  text  word's 
possible  constructions  have  been  exhausted. 

Pseudo-word  construction  Likes  place  for  all  of  a  word's  potential  parts -of-speoeh. 
Thus,  the  algorithm  assumes  a  word  is  a  verb,  noun,  adjective,  participle,  in  that 
order,  and  constructs,  if  jK>ssiblc.  a  set  of  pseudo-words  for  each  part-of-speech. 

The  algorithm  operates  upon  a  table  which  has  a  3-level  structure.  The  levels  are  as 
follows : 

(1)  Terminal  letter  for  a  given  part  of  speech 

(2)  Possible  suffixes  ending  in  the  terminal  letter  for  this  part  of  speech 

(3)  Canonical  suffixes  to  be  added  to  the  stem  after  suffixes  of  level  2  have 
been  removed 


.  V 


I 


Example:  This  example  shows  the  3 -level  structure  for  verbs  ending  in  K)  . 


(1) 

10 

r 

(2)  -M> 

-  E  K)  -JIW  -OK) 

y»  blO 

1010 

11!  J1I0 

(3)  -Tb  -BTb 
-ETb  -HTb 

-OTb  -YTb 

BATb 

•HTb  -ATb  -ETb  -HTb 
-HTb  -ETb 

•OBATb  -»*  T b 

*  EBATb 

-EBATb 

•CJIATb 

Now  suppose  nOJIYWAIO  occurs  in  text.  The  algorithm  assumes  first  it  is  a  verb. 

It  examines  the  terminal  letter,  finds  that  it  is  K)  and  that  K)  has  eight  ixxssible 
verbal  suffixes.  In  this  ease,  only  a  single  suffix.  W  .  is  contained  in  the  word. 

W  is  removed  from  the  text  word  and  the  seven  canonical  suffixes  are  added  to  the 
stem  to  form  seven  pseudo-words.  The  suffix  (-Tb)  gives  a  true  canonical  form. 

If.  now  (nony  m  at  b)  is  in  the  dictionary,  a  match  will  be  made. 

text  remove  K)  add  suffixes 

nojiyHAio - ~noJiyHA 

true  form 

i  may  be  objected  that  the  algorithm  simulates  too  closely  human  processes,  that  it  is 
illogical  inelegant,  even  -  to  remove  strings  of  letters  only  to  add  new  strings.  Why 
n<>t.  for  example,  restrict  the  algorithm  to  the  removal  operation ?  That  is.  make  the 
canonical  entries  in  the  dictionary  bo  some  stripped  form  of  the  real  word.  Then  the 
algorithm  need  only  remove  endings  and  compare  the  stripped  word  to  canonical  entries 
which  have  also  been  stripped. 


nOJIY4A 


B  AT  b 

y  T  b 

OTb 

HTb 

ETb 

Tb 


b  'J 


At  the  time  the  algorithm  was  developed,  it  was  considered  desirable  to  put  as  much 
of  the  translation  burden  as  possible  on  the  computer.  If  a  stripped  form  of  a  word  is 
used  as  the  canonical  entry,  how  is  the  stripped  form  arrived  at?  It  must  be  decided 
by  human  analysis  or  by  some  computer  algorithm.  Either  way,  additional  labor, 
human  or  machine,  is  necessary.  It  is  not.  then,  a  question  of  two  operations  versus 
one  -  removal  and  addition  versus  removal  -  but  of  two  operations  versus  two 
operations  -  removal  and  addition  versus  removal  and  removal.  In  the  first  case, 
removal  and  addition  occur  in  the  same  algorithm.  In  the  second  case,  the  first 
removal  operation  is  performed  by  a  human  or  a  computer,  but  in  either  case  it  takes 
place  independently  and  prior  to  the  second  removal  operation. 

Further,  the  algorithm  was  designed  for  use  with  a  parsing  program  and  it  was  felt 
that  there  was  important  syntactic  information  which  was  characteristic  of  classically 
defined  words  that  would  disappear  if  a  true  word  were  reduced,  in  effect,  to  a 
"non-word."  (We  are  thinking  here  of  the  phenomena  of  syntactic  and  semantic 
government. ) 


Appendix  C 

COMPRESSION  OF  PHRASES  INTO  A  LOGICAL  SUM 

The  phrase  to  be  compressed  is 

PACCTOXHHE  MEXAY  MACTHUA MH 

interparticle  distance 

The  IBM  7094  allows  six  alphanumeric  characters  to  be  stored  in  a  single  machine 
word. 


P  A 
X  H 
E  X 
A  C 
M  H 


C  C 

H  E 

a  y 

T  H 

0  0 


T  O 

0  M 

0  4 

U  A 

0  0 


Note  that  blanks  have  been  replaced  by  zeros  and  that  the  final  machine  word  has  also 
l>cen  padded  out  with  zeros. 


The  logical  sum  can  now  Ik.'  calculated.  (The  actual  numbers  shown  below  are  the 
numerical  representation  in  core  of  the  corresponding  Uussiun  letters  al»ove  > 


:*» 

2  J 

62 

62 

63 

16 

13 

i:» 

31 

23 

00 

4  l 

23 

33 

24 

64 

00 

04 

21 

«2 

63 

31 

23 

21 

44 _ 

31 

_ 00 

00 

00 

oo 

oo 

36 

24 

23 

07 

37 

kr  - 

final  Sum  00  36  24  23 

Cl 


u? 


41 


Appendix  D 

CONCEPTUAL  DESCRIPTION  OF  RUSSIAN  TEXT  PHRASE  TRANSLATOR 

D.  1  INTRODUCTION 

The  Russian  text  phrase  translator  consists  of  two  operationally  distinct  parts;  the 
first  being  a  computer-generated  phrase  dictionary,  the  second  a  computer-generated 
textual  analysis  which  assigns  to  every  text  interval  an  English  phrase  and  grammatical 
function.  The  English  phrase  is  either  a  translation  of  the  text  interval  or  a  statement 
to  the  effect  that  the  text  interval  is  neither  a  dictionary  phrase  nor  an  inflected  form 
of  a  dictionary  phrase. 

In  this  description  of  the  Translator,  the  following  notations  will  be  adhered  to. 
Collections  of  phrases  will  lie  denoted  by  capital  script  letters;  H  , 

u 

^  , . .  . ,  will  denote  the  subfamily  of  H  all  of  whose  translations  begin  with  the 

letter  a  ,  b  , .  . . .  Phrases  will  be  denoted  by  capital  Roman  letters:  R  ,  E . 

The  English  translation  of  a  Russian  phrase  R  will  Ik*  denoted  by  E(R)  .  Words  will 

be  denoted  by  small  Greek  letters  w  ,  g .  Small  Roman  letters  will  denote 

numbers  or  themselves.  To  any  Russian  word  0  (phrase  RJ  is  associated  its 
canonical  form  <•  I  It)  and  its  numerical  logical  sum  g(»0  lg(R)l  . 

t).  2  THE  PHRASE  DICTIONARY 

I  .cl  J\  Ik*  a  family  o!  Russian  phrases  in  which  the  left  -  most  noun  or  verb  occurs  in 

canonical  lorm.  Let  £t*>  tv  the  family  of  English  translations.  For  each  R  <  St 

a  representative  word  in  canonical  form  is  algorithmically  determined  (v  ■  R)  as 

well  as  ns  part  of  sjK'ia  h  (l*OS)  am!  its  iml>cdding  nH>rdinaics  \(v .  Ill  .  y(^.  It) 

relatin'  to  R  that  is.  If  R  - .  1» ji*.,-  •  *»j(  -  >  (T  -  "j  (  then  O  1  *  » 

>(.„•  R)  (.  l  et  now  H  lv)C4T  be  that  subfamily  of  %  whose  canonical  ivprc- 

on  o 

sent  at  i  \  c  word  is  v  .  The  a -maximal  coordinates  of  v  are  defined  as 


x{a)  =  max  _  x<w,R)  y(a)  =  max  _  y(u),R) 

R  c  /?a<u>)  R  e  *ft(u) 

Conceptually,  a  typical  entry  in  the  phrase  dictionary  is 

(g(R)>  E(R),  w,  POS, x(- ), y(- )) 

The  a-section  of  the  dictionary  may  be  written  as  . 

U  (g(R),  E(R),  w,  POS,x(a),y(a)) 

and  the  entire  phrase  dictionary  is 

U  U  (s(R)>  E(R), cU,  POS,x(- ),  y(- )) 

(•)  =  a  Re^.j 

D.  3  THE  TEXT  ANALYSIS 

A  prescribed  Russian  text  will  be  regarded  as  an  ordered  set  of  n  words.  Kncii  and 
every  word  interval  (a,b)  a  ^  b  will  be  regarded  as  a  text  phrase.  Denote  by  27  the 
set  of  text  phrases,  by  the  set  of  English  phrases  in  the  phrase  dictionary. 

Conceptually  the  textual  analysis  may  be  signified  as 

F:  27  —  (£(/?),  grammatical  function  ) 

Operationally  only  the  dictionary  significant  phrases  in  27  are  analyzed,  the  others 
being  assigned  an  English  phrase  by  fiat. 

The  analysis,  interpretation,  and  production  of  data  by  the  textual  analysis  algorithm 

are  perhaps  most  succinctly  described  in  the  following  conceptual  How  chart.  The 

notations  are  as  described  in  the  introduction.  Let  *®a  (|l(U),E(R),uJ.  POS,x(a).ypitlr 

be  the  a -section  of  the  phrase  dictionary.  Let  w.  be  the  i-th  word  in  the  ordered  text: 
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STEP  l:  Canonical  form 


> 


(w,i)  -  (w.i) 

STEP  2:  Lookup  w  in  (if  not  successful  go  to  ■  Failing,  goto  u.e  j) 

(u),i)  —  (u>.  i,  POS,x(a),y(a)) 

STEP  3:  Formation  of  phrases  in  which  a,  is  imlxxlded 

(uj,  i,  POS,x(a),y(a)j  —  { (a;,  i,  POS,T)}  ,  a.'  <  TC  [i  -  x(a),  i  •+  y(a>) 

STEP  4:  Canonical  form 


|U,i,POS,T)}  -  { (i,  POS,  T)} 


STEP  0:  Logical  sum 


(d.POS.T)}  -  {(i,POS.g(f))| 

STEP  6:  Phrase  dictionary  look  up  over  |g(T)} 

The  set  of  \(a)  yta>  triples  |i,  POS.gtT))  are  ordered  according  to  the  length  ot  the 
phrase  T  ,  the  ordering  amongs  equal  length  phrases  iieing  indifferent.  The  ordered 
logical  sums  {g(Tij  are  then  matched  against  the  set  ’  gcKij  api>*‘aring  in  ,  <  te 
Assummg  the  first  agreement  occurs  tor  gil  .  .$>  ^  .  the  tigorrhm  then  produ-  <  s 
the  information  vector 

|/i.  IHJS,  gtT>)|  —  (i,  IMS,  g(Si.  KiSi.  xto. .  Si.  yt ^  ,  Si) 


STEP  7 


K  (i  -  M  .  S).  i  •  \ i _  S>)  *  (t  t>»  IN  *s) 


3 


STEP  8:  After  an  agreement  has  occurred  (if  no  agreement  occurs  in  STEP  6] ,  the 
textual  analysis  algorithm  is  repeated  for  Sj+1  provided  the  indicated 

subscript  is  £  n  .  Otherwise  the  text  phrase  translation  is  terminated  by  assigning 
to  all  non  determined  text  interval  phrases  T  e  V  the  English  phrase  "not  significant," 
and  grammatical  function  "none." 
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Appendix  E 

OUTPUT  OF  THE  INDEXING  SYSTEM 


The  first  few  paragraphs  of  a  geological  article  entitled  "Phase  Transformations  in 
the  Interior  of  the  Earth,"  by  S.  M.  Stishov.  Nature.  Sep  1962  were  used  as  input  to 
the  Indexer  (Fig.  E-l).  The  simple  and  complex  indexes  are  listed  in  Figs.  E-2  and 
E-3.  Single  words  have  been  eliminated  from  the  complex  index,  though  not  from  the 
simple.  The  computer  dictionary  is  based  on  Sofiano's  geological  dictionary  (lief.  1). 


E-l 
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CDA3CBLIE  nPEBPAID,EHHH 
B  rJiyBHHAX  3EMJIH 


/l.ocTiuicoiiiin  »tcxniiiiK»  n  (jmaiiKii  n 
XVII— XVIII  rm.  noouo-Tii.tii  onpcAeaitTi.  m;ic- 
cy  u  cpc/iiiioio  nJioTuocTi.  3c.mjiii.  IIocacAmifl 
ottaananci.  pantioir  5,5  c/c.us.  A  Tait  itaic  naoT- 
hocti.  liain'oaeo  Tmitcaux  nopoA  na  noncpxuo- 
cth  3cm;ih  no  npcniainacT  3,3  r/c.u3,  to, 
ccTCCTnomio,’  noamiKJto  iipCACTanaemie,  mto 
■  naoi.iocTi.  3cMatt  yiiojiii'iiiuaoTCH  c  rayoir- 
lioii. 

<I>ai<TJ4  cyiuocTuonamrn  >i:cJio3iiwx  mctco- 
piiTon,  a  TaitJicc  u  upom.TOM  iioiiyanpitanTeopiwi 
npoiicxoncAoiitijt  3cm an  na  ropnnoio  iiornecTna 
Coaiuia  lipiiuc.Tit  Miiorux  ynciiux  k  MMcaito 
KOiiKt'iiTpniuiii  atcaoaa  »  ueiirpo  3om.iii.  Ilpu- 
.Mcumvii.iio,  mto  ya;o  imoano  oitpcAcacimuo 
m.iCKa:iunaiiiui  i[.paiinyaciioro  rooaora  A.  HoC- 
pajii  1 806  r.  o  accacomw  jiapo  3c.ua  n  ncitopo 
iioJiyiiiait  uoAAcpiitity  co  cToponu  ccucMoaoron, 
KOTopuM  d  Kongo  XIX  ii  lianaac  XX  ». 
yAaaocn  ycranouiiTt  naanuiio  u  3cm;io 
a.apa. 

13  20-x  ro/inx  Tcuyigcro  CToacTim  13.  M. 
I’oai.AaiMiiAT  (ilopiionm)  it  ireMouiiiui  cjiiianuo- 
xiiat.it;  1\  Tn.MM.nt  paanitau  npcAcraiiaciino  o 
tom ,  mto  u  itepnonaMaai.no  pacnaanacunoit 
3cm.'io  liponcxoAiiao  pasAcacmic  (Aii<I«I>cpcii- 
UiiaHHii)  ih'ihcctu  no  hx  naoTiiocTii,  aaaaoni’i- 
no  ro.My.  mto  mm  li.MccM,  iiaiipiiiicp,  n jm  naan* 
no  cyaiiihiiAiiMx  ])y,(.  llpii  dtom  lipouccco  no- 
miamoTcn  tjiu  caoii:  inaai;  (ciutiuiaTUMii  caoii), 
niTciin  (c.«.ivi,  i'yai.i|iii;;«)ii  a  .MCTaaaa)  n  cou- 
cvueniio  .MCT.iaa.  Coraacno  DToii  riui0T03c,  u 
3cmjio  iiuac.m  naitcr.  cacAyioiguo  caoii:  cnatutaT- 
uuu  it  cyaMpiAiiiitii  (oooaouita  3c.Man)  it  mc- 
Taaau'iccKmt,  cocToaupiii  U3  ntoacaa  c  npii- 
mcclio  miKoaii  («AP<>  3cMan). 


AMcpnitancKiic  ynoiuiic  <I>.  Kaapit,  F.  Bainnnr- 
TOd,  JI.  Aaamc  u  Ap.  no  iimao.mh.mu  cyai.tjiiiA- 
n  ,i ft  caoii;  oiut  iioaaraau,  mto  Mc>i<Ay  nteao.i- 
iium  nApoxt  ii  citaiiKaTiioit  oGoaoMKoii  uaxo- 
AiiTcn  npoMC/ityTOMiiaa  ouaacTt.,  cocTomuan  na 
cmccii  cnaiiKaTon  it  >ucac3a. 

Tcopnn  caoncToii,  xiimiimcckh'  A»iI»I»cpcn- 
UtipoBatmoii  3cm.mii,  no  Muoroji  ho;ikj>ciiah* 
aacr.  AnnniiiMii  cciicMo.’iorou,  itoTopi.ro  nopno- 
iiaMaanito  cMirraan,  mto  u  MaiiTim  (oGoaomtc) 
3c.Maii,T.  o.  n  Toii  oc  Macm,  ltoTopan  pacnoao- 
Htona  MC/itAy  scMiiou  ltopoii  n  jiApo-M,  cyrgc- 
CTiiycr  Miioro  rpanuu  paaAcaa. 

B  A3Jii>iiciiiiicM  yciicxn  rcoilmaiinii  ri  i;ocmot 
roiiini ,  cunaaiuiMo  ivianm.tM  ofipnaoM  c  iimoim- 
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Fig.  E-l  Russian  Text  Used  for  Faiglish  Indexing 
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r  a  n  p  i  c  g 


AVFRaGE  np'.'SJTV 

melting 

CENTER 

METAL 

CHEMICAL 

METALLIC 

COAT  I  NG 

Mixiuwe 

DENSITY 

ORIGIN 

D  I  P  F  F  R  c  N  T  I  A  T  P  D 

PROCESS 

D  i  F  (■  E  R E N T  I  A  T  I  ON 

REGION 

EARTH 

ROCK 

EARTH  SHELL 

SIDE 

GEOLOG  1ST 

SILICATE 

HTPUTHFS I S 

SLAG 

I MPUR  1  T  Y 

stratum 

INTERMEDIATE 

sulphide  ore 

I  RON 

SULPH  I  Tfc 

IRON  METEOR  I  TF. 

SUPPORT 

L  I  ■  I  T 

SURFACE 

mantle 

theory 

M  A  T  T  F  R 

Fig.  E-2  Simple  Index 


I  HON  WITH  IMPURITY 


AVERAGE  DENSITY  Or  EARTH 
COaTING, SILI  AT£  , 

DENSITY  nr  earth 
n ifffrfnt i a r i ov  Of  matter 
FaRTH,  AVERAGE  DENSITY  OF 

EARTH,  D t- N S  I  T  Y  OF 
earth,  irov  in  center  fo 

EARTH,  SURFACE  Of 

earth,  theory  nr  origin  i'E 
earth  shell 

EXISTENCE  OF  IRON  M  E  T E  0 k  {1 1 

IMPURITY,  iron  WITH 
INTERMEDIATE  RFGIUN 
IRON  IN  O  E  f  T  f-  R  Of  EAR  I  H 


IRON  MFTRORITE,  EXISTENCE  Of 
MATTER,  n  I  Ef  f-RFNT  I  A  1  1  ON  Of 

meltingTf  sulphidf  ore 
METAL,  MIXTURE  OF  SULPHITE  AND 
MIXTURE  OF  SULPHITE  AND  METAL 
REGION, INTERMEDIATE, 

SILICATE  COATING 

silicate  stratum 

SIRATUM.SILI  ATE, 

SULPHIDE  ORE,  MELTING  OF 

SURFACE  RE  EARTH 

THEORY  OF  ORIGIN  OF  f;  A  R  f  H 


Fig.  E-3  Complex  Index 
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